Data-driven speech representations for NMF-based word learning

نویسندگان

  • Joris Driesen
  • Jort F. Gemmeke
  • Hugo Van hamme
چکیده

State-of-the-art solutions in ASR often rely on large amounts of expert prior knowledge, which is undesirable in some applications. In this paper, we consider a NMFbased framework that learns a small vocabulary of words directly from input data, without prior knowledge such as phone sets and dictionaries. In the context of this learning scheme, we compare several spectral representations of speech. Where necessary, we propose changes to their derivation to avoid the usage of prior linguistic knowledge. Also, in a comparison of several acoustic modelling techniques, we determine what model properties are beneficial to the framework’s performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discovering hierarchical speech features using convolutional non-negative matrix factorization

Discovering a representation that reflects the structure of a dataset is a first step for many inference and learning methods. This paper aims at finding a hierarchy of localized speech features that can be interpreted as parts. Non-negative matrix factorization (NMF) has been proposed recently for the discovery of parts-based localized additive representations. Here, I propose a variant of thi...

متن کامل

Discovering Convolutive Speech Phones using Sparseness and Non-Negativity Constraints

Discovering a representation that allows auditory data to be parsimoniously represented is useful for many machine learning and signal processing tasks. Such a representation can be constructed by Nonnegative Matrix Factorisation (NMF), which is a method for finding parts-based representations of non-negative data. Here, we present an extension to convolutive NMF that includes a sparseness cons...

متن کامل

Implementation and test of activation-verification mechanisms

and embedding in ACORNS We describe a bottom-up, activation-based paradigm for continuous speech recognition. Speech is represented by co-occurrence statistics of acoustic events over an analysis window of variable length, leading to a vector representation of high but fixed dimension called “Histogram of Acoustic Cooccurrence” (HAC). During training, recurring acoustic patterns are discovered ...

متن کامل

A Deep Non-Negative Matrix Factorization Neural Network

Recently, deep neural network algorithms have emerged as one of the most successful machine learning strategies, obtaining state of the art results for speech recognition, computer vision, and classification of large data sets. Their success is due to advancement in computing power, availability of massive amounts of data and the development of new computational techniques. Some of the drawback...

متن کامل

Discovering Words from Continuous Speech

Modern speech recognizers rely on preprogrammed knowledge of specific languages and extensive examples including annotations. Children however are remarkably welladapted to learning language without such help, learning from examples of the speech itself and from the environment in which they live. Modelling this learning process is a very interesting but also a very complex topic which encompas...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012